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ABSTRACT 

We infer period (P) and size (Rp) distribution of Kepler transiting planet candidates with Rp > IR^ and 
P < 250 d hosted by solar-type stars. The planet detection efficiency is computed by using measured noise and 
the observed timespans of the light curves for ^ 120,000 Kepler target stars. Given issues with the parameters 
of Kepler host stars and planet candidates (especially the unphysical impact parameter distribution reported 
for the candidates), we focus on deriving the shape of planet period and radius distribution functions. We find 
that for orbital period P > lOd, the planet frequency d^V^/dlogP for "Neptune-size" planets (Rp = 4-^Rq) 
increases with period as oc pO.vio.i contrast, dNp/dlogP for "Super-Earth-Size" {2-4Rq) as well as "Earth- 
size" (1 - 2R^) planets are consistent with a nearly flat distribution as a function of period (oc ii±o.05 ^j^^ 
oc f>-o io±o.i2^ respectively), and the normalizations are remarkably similar (within a factor of ^ 1.5). The 
shape of the distribution function is found to be not sensitive to changes in selection criteria of the sample. 
The implied nearly flat or rising planet frequency at long period appears to be in tension with the sharp decline 
at ~ 100 d in planet frequency for low mass planets (planet mass rrip < 30M®) recently suggested by HARPS 
survey. 

Subject headings: 



1. INTRODUCTION 

The Kepler mission provides an unprecedented opportu- 
nity to study the size and period distribution of extrasolar 
planets down to Earth radii within yr-long orbits by making 
high-precision (~ lO"'*), high-cadence (^ 30min) and nearly- 
continuous monitoring of ~ 10^ stars over years. Based on the 
' transiting planet candidates discovered from the first 4 months 
of Kepler data (Borucki et al. 201 1, hereafter Bl 1), Howard 
et al. (2012) (hereafter HI 2) made statistical inference of the 
frequency for planets with radii Rp > 2R(^. H12 found that 
planet frequency increases for decreasing radii, finding that it 
drops sharply for planets with very close-in orbits [P < lOd). 
They claimed that Kepler planet frequency is consistent with 
those found by Radial- Velocity (RV) surveys (Mayor et al. 
2009; Howard et al. 2010). Several other studies also use 
B 1 1 sample to study planet distribution. Gould & Eastman 

(201 1) found that there is a break in the radius distribution of 
Bl 1 candidates at ^ 3R(^. By extrapolating the detection effi- 
ciency deduced by H12 and applying a maximum-likelihood 
approach, Youdin (201 1) fitted the distribution of Bl 1 candi- 
dates down to 0.5R^ and found a relative deficiency of ~ 3R(q 
planets at P < 7days. Catanzarite & Shao (2011) and Traub 

(2012) attempted to extrapolate the planet frequency obtained 
from Bl 1 candidates to estimate the fraction of Sun-like stars 
that host habitable Earth-like planets. 

The latest release of Kepler based on 16 months (quarters 
Q1-Q6) of data (Batalha et al. 2012, hereafter B12) has in- 
creased the number of known planet candidates by factor of 
~ 2 (from 1200 to ^ 2300). As expected, there is a large 
gain in planet candidates at long periods as well as those with 
small radii compared to those found in B 1 1 . But according 
to B12, there is also a considerable unexpected gain relative 
to Bl 1 for short-period planets from merely the effects of in- 
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creasing the length of the observing windows, and the implied 
lower-than-expected efficiency of planet search pipeline em- 
ployed by Bl 1 may affect above-mentioned statistical results. 
One important improvement in B12 is that Kepler team for 
the first time stitches different quarters together in the tran- 
sit search, which especially increases the robustness of search 
for long-period planets. In fact, two independent automatic 
planet searches on Q1-Q6 data by Huang et al. (2012) and Ofir 
& Dreizler (2012) as well as crowd-sourced human identifica- 
tions by Planet Hunters (Schwamb et al. 2012) only identify in 
total ~ 10% more new planet candidates than those found by 
B12, suggesting that B12 searches are likely highly efficient. 

We derive planet frequency as a function of period and 
planet radius using Kepler planet candidates discovered by 
B12 as well as those found by other groups. Like the majority 
of the works on Kepler statistics to date, we do not distin- 
guish planet candidates from planets, i.e., we assume a low 
false positive rate (see § 5 for more discussion). The transit 
planet detection efficiency is calculated for each Kepler star 
using measured photometric noise of its light curve and the 
observed timespan (excluding gaps and missing quarters). In 
addition, the geometric bias for circular orbits is taken into 
account. Given some issues in the parameters for the planets 
and their host stars, we focus on determining the relative fre- 
quency for planets with various radii as a function of period 
for Sun-like hosts. We do not distinguish planets in the single 
or multiple transit systems to derive planet multiplicity func- 
tion (Tremaine & Dong 2012). The Kepler planet frequency 
derived below extends down to Rp > 1/?® and with period up 
to ^ 250 days. This can be compared with planet frequency 
inferred from RV searches by 8-yr HARPS survey, which is 
sensitive to long -period [P > 100 days) Super-Earth and Nep- 
tunes with mass Mp < 30M(^ (Mayor et al. 201 1). 

2. ISSUES WITH SELECTING THE KEPLER STAR AND PLANET 
SAMPLE 

2.1. The Kepler Input Catalog 
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Planet frequency is usually defined with respect to an en- 
semble of host stars that share similar physical properties. 
Stellar type, metallicity, age and population may have impacts 
on the frequency of planets. The Kepler target stars were se- 
lected based on multi-band photometry (documented in the 
Kepler Input Catalog, KIC), and the selection was focused on 
finding solar-type stars to search for Earth analogues. 

The KIC photometry is most sensitive to determining the 
effective temperature T^ff, less reliable for constraining sur- 
face gravity \ogg (especially unreliable for cool stars) and has 
little sensitivity to metallicity. We do not attempt to study the 
planet frequency as a function of metallicity, which would re- 
quire comprehensive spectroscopic follow-ups. The relatively 
large uncertainty in \ogg may have serious impact on the fre- 
quency study. Unreliable \ogg estimates may introduce ambi- 
guity between dwarfs and sub-giants/giants with the same T^ff. 
Furthermore, errors in log g dominate the uncertainties in stel- 
lar radius measurement, which translates into the uncertainty 
of the planet radius since only planet-to-star radius ratios are 
measured from transit light curves. 

To study the uncertainty in \ogg, we use the high-precision 
stellar parameter derived from high-resolution spectroscopic 
follow-ups of more than a hundred Kepler planet host stars 
from Buchhave et al. (2012). In the upper panel of Fig.l, 
the KIC Teff and \ogg are plotted in black solid dots, and the 
\ogg values from 104 spectroscopic measurements are plotted 
at the end of the red lines connecting from the KIC values. 
The majority of the stars in KIC have Teff between 4500/r 
and 6500/^, and we divide the stars into four equal bins in 
temperature in this range. For each bin, the average difference 
between the two sets of measurements Alogg is within ±0.1, 
with no strong systematic preference in sign. The average 
dispersion is about 0.3dex, except the bin with 4500/r < T^ff < 
5000A', which has a dispersion of 0.4dex. In the lower panel, 
the histogram for | Aloggj is shown, for the bin with 4500A' < 
Teff < 5000/*:, 50% of the stars have |Alog^| > 0.3dex, while 
< 30% stars have |Alog§| > 0.3 for the three other bins. It 
seems the problem with \ogg uncertainty is most severe for 
stars with Teff < 5000/r, and we choose not to include them in 
our stellar sample for this study. The averaged dispersion in 
\ogg for the chosen stellar sample is therefore 0.3dex, which 
translates into 0.15dex dispersion in planet radius estimate. 

B12 commented that a considerable fraction of KIC stellar 
parameters were not consistent with known stellar physics. 
They matched KIC Teff, \ogg and [Fe/H] with Yonsei- 
Yale isochrones (Demarque et al. 2004) by minimizing 
(SnH/200Kf+(Slogg/0.3f+{S[Fe/H]/0.4f, where S is the 
difference in the KIC and Yonsei-Yale parameters. They re- 
ported the stellar paramters (and the derived planet param- 
eters) using the "corrected" values from Yonsei-Yale. We 
note that the "corrected" stellar parameters do not match the 
spectroscopic measurements from Buchhave et al. (2012) bet- 
ter than KIC. Nevertheless, they are at least self-consistent 
for each star according to stellar physics (e.g., the parame- 
ters match mass-radius relation). We follow the procedure 
by B12 and adopt the "corrected" parameters throughout the 
paper In Fig.l, the "corrected" Teff and \ogg are plotted 
in yellow dots and the KIC values are shown in grey dots 
(the Yonsei-Yale isochrone for 5Gyr with solar metallicity 
is shown in cyan). It is interesting to note that many stars 
at 4500/ir < Teff < 5000/ir have KIC logg values inconsistent 
with any reasonable isochrones. 

Our stellar sample consists of Kepler stars with 5000^" < 
Teff < 6500/r (approximately corresponding to K2-F5 dwarfs) 



and 4.0 < logg < 5.0. These limits are shown as black box 
in Fig. 1 . We also exclude stars with Kepler magnitude niK > 
16, which consist of a negligible fraction of Kepler stars and 
have little sensitivity to planets. The sample includes a total 
number of stars A^* = 122328 . 

2.2. Impact Parameter Distribution 

Only the planets whose orbits are oriented within a limited 
range of inclination angles are observed to transit their host 
stars. One basic assumption to make statistical inference from 
an ensemble of transiting planets is that orbital inclinations of 
planets are distributed randomly with respect to the observer 
Following this assumption, the impact parameters b, which 
are the minimum planet-star projection separations normal- 
ized by the radii of the stars during the transits, are distributed 
uniformly for the observed transits. Then from the observed 
transits, one may correct for the selections due to such ge- 
ometric conditions ("geometric bias") to take the number of 
non-transiting planets into account. For circular orbits, the 
geometric bias is gp = R^,/ap for transits with <b < 1. 

The histogram of best-fit b values for Kepler planets re- 
ported by B 1 2 is plotted in the upper left panel as well as pos- 
terior probability distribution considering Gaussian errors in 
the upper right panel (in the latter case, the unphysical values 
for b <Q due to the Gaussian distribution are shown). The b 
distribution is far from being uniform, and it is highly skewed 
toward high values {b ^ 0.8-0.9). This unphysical distri- 
bution cannot be explained by selections due to observation 
thresholds (transits with low b are easier to detect than those 
with ^ 1 as the former generally have higher S/N). Note 
that for candidates with high S/N (>100), the distribution is 
less skewed toward siml but with a peak at (see the bottom 
right panel of Fig. 2). This is understandable as at low impact 
parameter, the transit profile is hard to distinguish from those 
at and the fitting algorithm may assign = as the best fit. 

One possible source of the unphysical distribution of b 
skewing toward ^ 1 may be artifacts or biases introduced by 
the fitting procedures employed by the Kepler team. One pos- 
sibility is not accounting for the integration time of the expo- 
sure time in the modeling (Kipping 2010, J. Loyld, B. Gaudi, 
Private Communications). The other possible source may be 
that some of the high-Z? planets are due to false positives. 

Resolving this discrepancy is beyond the scope of this 
work. In the following analysis, we test whether the planet 
sample with b < 0.6 and b < 0.9 result in different distribu- 
tion functions. Obviously, given the skewed b distribution, 
the normalization of planet frequency has considerable differ- 
ence between the two samples. We focus on understanding 
whether the shape of the distribution function is affected by 
the upper threshold of impact parameter /^thres- 

3. PLANET DETECTION EFFICIENCY OF Kepler FROM DETECTION 
THRESHOLDS 

Besides the geometric selection discussed above, the other 
main selection effect is the survey selection, which denotes 
the incompleteness due to the detection thresholds of the sur- 
vey. A transit candidate is considered to be detected if (1) the 
number of transit occurrence A^tra exceeds a threshold and (2) 
the total S/N of the transit signals is greater than a threshold 
S/Nthies- We discuss both detection thresholds in detail in the 
following sub-sections. 

To characterize the survey selection, we introduce planet 
detection efficiency e{P,Rp), which is the fraction of stars in 
the stellar sample for which a planet with period P and radius 
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Rp can be detected (i.e., the above two thresholds are satis- 
fied). For each star i in a sample with a total of A^* stars, the 
noise ct,- and time window during which it is observed, r„ i, are 
known. For a hypothetical planet with P and Rp around this 
star /, we calculate Nu-i and S /N for 100 uniformly distributed 
phases for the planet transits within time window Tw^. Then 
we count the fraction of phases 77, where both A^tra.thres and 
S /Nthies criteria are satisfied. Finally, we can obtain detection 
efficiency for the planet in the sample by summing 77, of all 
the stars, to be e{P,Rp) = Y,1\rii/N^. 
The intrinsic planet frequency fp is defined as. 



d-Np/N, 



d log 10 Pd log 10 7?;/ 



(1) 



where Np is the intrinsic number of planets around A^, host 
stars. With both detection efficiency e{P,Rp) and geometric 
bias gp known, the intrinsic planet frequency fp can be de- 
rived using the following relation. 



d^A^, 



/?,det 



dlogioPdlogioTJp 



= NJpe(P,Rp)gp, 



(2) 



where A';,.det is the number of planets that pass the detection 
thresholds. 

In the following two subsection, we will describe how we 
calculate two survey selection criteria: 1) Nta, and 2) S/N 
threshold 

3.1. A^tra Threshold 

We include the effects of transit window function, which is 
important for statistics of long-period planets (Gaudi 2000). 
Out of 122328 stars we have selected, ~ 68.5% have data 
overall six quarters, - 13%, 1.5%, 12.4%, 4% and 0.7% miss 
1,2,3,4 and 5 quarters, respectively. Over all 6 quarters, the 
gaps between quarters and the artifacts amount to a total of 
51.5d, which is ^ 10.4% of the duration between the start of 
Ql to the end of Q6. B12 used the Transiting Planet Search 
(TPS) module (Tenenbaum et al. 2012) as the primary algo- 
rithm to search for periodic square pulses within Q1-Q6 and 
then sought confirmations in Q7-Q8. Strictly speaking, TPS 
module finds transit with at least three occurrences (Tenen- 
baum et al. 2012), but B12 include planet candidates with 
fewer transits occurring in their sample. Moreover, the inde- 
pendent searches by Huang et al. (2012) and Ofir & Dreizler 
(2012) overQl-Q6 that include transits less than 3 occurrence 
only yield 10% more candidates with no obvious preference 
over long-period ones. For a detection, we adopt a transit oc- 
currence criterion that at least 2 transit occurrences in Ql -Q6 
so that it is periodic in this window and 3 transit occurrences 
in Q1-Q8 so that the detection is secure. We also vary this 
criterion to demand 3 transit occurrences in Q1-Q6 to check 
whether we obtain consistent planet statistics in § 5. 

In order to evaluate the effect of window functions, for each 
trial period, we make 100 simulations with the center of the 
transits occurring at different times, which are evenly dis- 
tributed within the period. Then we record the number of 
transit occurrences for each quarter in each simulation. In 
Fig. 4, we show /window, the fraction of simulated transits that 
satisfy the transit occurence criterion as a function of period. 
The black line represents a star that has been observed over all 
8 quarters, /window starts to decrease from 100% at P lOOd 
to 50% at P 250 d then to at above ^ 340 d. We also show 
an example that has one quarter (Q5) is missing in red line, for 
which /window is typically ^ 10-20% smaller at long periods 



and no transits satifies the occurrence criterion at P > 300 d. 
This emphasizes the importance of considering various transit 
phases for deriving the frequency of planets with long period 
beyond 100 days. 

3.2. S/N Threshold with Box-like Profile 

The statistics of Kepler planet frequency presented in this 
work is made by 1 ) using a simple box-like transit profile for 
both real and hypothetical planets, and 2) modeling the planet 
detection threshold with a lower limit in transit signal-to-noise 
ratio S/N > S /N,/,. This is the same assumption made by H12 
and B 1 1 . The simple box-like transit profile is characterized 
only by depth 6 of the transit and transit duration fdm with 
the photometric error <t for fdui - For each star, we have calcu- 
lated a in each individual quarter separately by interpolating 
the published CDPP values (by the Kepler team) at 3hr, 6hr, 
1 2hr intervals to the desired transit duration time fdui- (for a de- 
scription of CDPP see Christiansen et al. 2012 and the CDPP 
tables are downloaded from the official Kepler MAST site). 
The total S/N from observing A^tra box-like transits is. 



S/N = 



Mm C2 



(3) 



The box-like transit profile applies in the limit that the planet- 
to-star radius ratio is small {Rp <C P*), zero impact param- 
eter {b = 0), and uniform host star surface brightness profile 
(no limb-darkening). In this limit, S = (Rp/R^,)^, and fdm = 
P*P/(7rfl) for circular orbit. Both real and hypothetic planets 
are considered to be detected when S /N(Pp,P) > S /Nthres- 

In this limit, the dependency of S/N on impact parameter b 
is ignored. In the experiments we carry out below where we 
vary the upper threshold /^thres for the selection of the planet 
sample, we simply modify the geometric bias to be gp x ^thres- 
In Dong & Zhu (in prep), we introduce a full framework that 
takes the effects of limb-darkening and ingress/egress into ac- 
count. In that case, fethres also introduces changes in detec- 
tion efficiency e since the S/N detection threshold depends on 
b. Similar to Gould et al. (2006), we find that adding limb- 
darkening and ingress/egress making little difference in the 
inferred distribution. 

4. RESULTS 
4.1. Kepler Planet Frequency 

We first carry out the detection efficiency calculations de- 
scribed above for a dense 100 x 40 grid of (P, Rp) with P from 
0.3d to 500d and Rp from 0.5Re to 32Re. The grids are di- 
vided uniformly in log space for both P and Rp. In the main 
calculation, we choose ^ S/Nthies = 8, A'tra,thies(Ql ~Q6) = 2 
and A/tra,thres(Ql -Q8) = 3 and Zjthres = 0.9. All the thresholds 
are varied in § 5 to make consistency checks. The resulting 
detection efficiency e, and A^*,eff = N^, x e x gp, which rep- 
resents the planet sensitivity considering both detection ef- 
ficiency and geometric bias, are shown in the left and right 

' Note that the S/N values we calculate above using CDPP are very close 
to the Multiple Event Statistics (MES) values reported by B12, which is the 
quantity used by the main Kepler transit search algorithm TPS that resembles 
transit S/N for a periodic square-pulse search. MES is demanded to be greater 
than 7.1 in the search conducted by the Kepler seairh. We adopt a higher 
threshold 8, which corresponds to the turnover of the right-hand panel of 
Fig. 7 of Tenenbaum et al. (2012). The S/N of the transit fit reported by 
B12 does not have the cut of 7.1 (with minimum of 4) and is on average 
factor of ~ 2 higher than MES with large variance in ratio between the two 
quantities. Throughout the paper, we use the S/N values calculcated using 
CDPP to closely mimick the transit detection processes employed by TPS. 
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panels of Fig. 5, respectively. Beyond lOOdays, Kepler's 
sensitivity to detect R^, planets drops abruptly. 

Then, we divide the P and Rj, plane into 15x4 bins which 
are uniformly distributed in logjgP, logjg/Jp with P from 
0.75d to 250c/ and R,, from IRe to 16Re (see Figure 7). In 
each bin, we take the detection efficiency as well as geo- 
metric bias into account and calculate fp{P,Rp) as defined in 
Equation (2) and its l-a uncertainty assuming a Poisson dis- 
tribution. In each bin, fp is assumed to be distributed uni- 
formly in logigP and log^QRp. For the bin in which there is 
no planet detected, we compute an upper limit at 90% confi- 
dence level. There are 2486 planet candidates in total includ- 
ing B12, Huang et al. (2012), and Ofir & Dreizler (2012), our 
stellar parameter cuts limit the number of planets to 1801, and 
1347 of these survive our detection threshold cut. The bins at 
lower right corners have the least secure statistics due to low 
sensitivity in detecting planets and relatively large gradients 
in the sensitivity. The sensitivity A^»,eff = A^* xex gp is plotted 
in red lines in Figure 7. 

The intrinsic number of planets within each period and 
planet radius bin (A^^ per star), is shown in Fig. 7. The planet 
radii bins are 8-167?0 ("Jupiter-size"), 4-87?0 ("Neptune- 
size"), 2 - 4Rq ("Super-Earth-size") and 1 - 2R(^ ("Earth- 
size"). The bin size in logRp (0.3 dex) is chosen to be larger 
than the averaged dispersion in logRp (0. 15dex) due to the un- 
certainty in KIC log^ estimates. The above-mentioned bins 
with the least secure statistics are plotted in dotted dash lines. 
These include the four longest period bins for Earth-size plan- 
ets (1/?© < R < 2Rq) and the longest period bin for Super- 
Earth-size planets (2/?© < R < 4/?©). 

We confirm the the sharp drop below 10 days in planet fre- 
quency for all sized planets identified by Howard et al. (2012). 
Beyond 10 days, the most striking feature is that the frequency 
of Neptune-size planets rises sharply while the smaller planets 
with Rp from 1-4 Z?© have frequency consistent with being flat 
in log P. Quantitatively, the frequency of Neptune-size planets 
increases by a factor of ^ 5 from 10 days to 250 days. In con- 
trast, the frequency of Earth-size and Super-Earth-size planets 
are consistent with flat distributions in logP within 1-2 ct error 
bar beyond lOdays. The frequency of Jupiter-size planets in- 
creases more slowly compared to the rise of the Neptune-size 
planets. These trends survive by varying several observational 
cuts (discussed in § 5. 1) so they appear to be robust. 

Next we show the accumulated planet frequency for planets 
with different sizes in Figure 8. Within 250 days. Earth-size 
and Super-Earth-size planets have almost the same cumulative 
frequency ^ 30%, which is 4 times larger than the Neptune- 
size planet frequency (~ 7%), or '-^lO times larger than the 
Jupiter-size planet frequency (^ 2.5%). The total frequency 
for all the planets from 1-16Re within 250 days is ^ 60%. 
However, the absolute normalization is likely not robust as it 
can vary by a factor as large as ^1.5 depending on various 
cuts (in particular impact parameter cut) as discussed in § 5 . 1 
below. 

We then show the planet frequency as a function of planet 
size within three period bins (0.4-10, 10-50, 50-250 days) in 
Fig. 9. There appear to be clear evolution of planet size dis- 
tribution as a function of period. At all periods, the domi- 
nating population in number is the planets with small radii 
{Rp < 4-Rq). There are clear breaks in the distribution func- 
tion at ^ 3Rq and ^ IORq. At the shortest period (< 10 days), 
below 3Rq, the planet frequency in logigT?,, increases slowly 
toward small radii. After a relatively steep drop in frequency 
at 3-4Rq, larger planets are consistent with a flat distri- 



bution up to ^ 12/?©. At longer periods (> 10 days), be- 
low 3-4Rq, the distribution is consistent with being flat in 
logiQRp (or even consistent with slightly decreasing toward 
small radii for the P = 10- 50 days bin). We caution that 
planet statistics presented here is the least secure for 1 - 2/?© 
at P > 50 days. Within P = 10-50days, planet frequency in 
log [()/?,, for planets larger than ~ 3/?© clearly decreases for 
increasing radius up to ~ 10/?©. In the bin with longest peri- 
ods (P = 50-250 days), for planets with /?© =--3-10/?©, the 
frequency distribution is nearly flat in logj,,/?,, up to ~ 10/?© 
then it drops sharply at > 10/?©. Overall, at longer period, the 
relative frequency for big planets (3/?© < R < 10/?©) com- 
pared to small planets (1/?© < R < 3/?©) becomes higher 

The method presented in this section has the advantage of 
making no assumption on the functional form of planet dis- 
tribution, but the data are binned, which has the implicit as- 
sumption that planet are distributed uniformly within the bins. 
Thus, the results may depend on the bin size. We have tested 
the effects of bin sizes by using bins that are factor of 3 
smaller, and the resulting trends in frequency are consistent 
with those presented above. 

4.2. The maximum likelihood method 

Motivated by the linear trends seen in the log-log plots in 
the period distribution for P > 10 days discussed in the pre- 
vious section, we model these trends with power-law depen- 
dencies in period using the maximum likelihood method. This 
approach has the advantage of requiring no binning. 

We follow Tabachnik & Tremaine (2002) and Youdin 
(201 1) to calculate the log likelihood function as 

ln(L) = Mejgpjfp)-Ne,p . (4) 

where the sum is taken over all the planet candidates, ej and 
gpj are the detection efficiency and the geometric bias as de- 
fined above. The intrinsic planet frequency fp is defined in Eq 
1 and the assumed analytical form is 

fp = Cx (P/lOdays)'^ when P > lOdays (5) 

where /? is also the slope of the intrinsic frequency in the log- 
log plot. Nexp is the expected number of planets with the as- 
sumed fp 

Nexp = J N^epgpfpd\ogioPd\ogioRp. (6) 

We numerically solve the maximum log likelihood for plan- 
ets in each radius bin (1-2, 2-4, 4-8, 8-16 /?©). The resulting 
C and P are given in table 2. Multiplying fp with the bin 
size as in §4.1, we derive the planet frequency, which is over 
plotted in the left panel of Figure 7 as the grey dashed lines. 
Our maximum likelihood fits are very well consistent with the 
trends in distribution functions described in §4.1, confirming 
our claims that planets at 1-4 /?© have a nearly flat distribu- 
tion in logioP beyond 10 days, while planets at 4-8 /?© display 
a fast increasing distribution in logjyP for increasing period 

^pO.ViO.l^ 

We assume power-law distributions with respect to planet 
period for planets in four different radii bins. We do not at- 
tempt to assign any analytical distribution function for planet 
radii to do a maximum likelihood fit to both planet period and 
radii, since Figure 9 suggests that the planet radius distribu- 
tion function is more complicated than simple power-law or 
broken power-law distribution. Figure 9 itself is more instruc- 
tive than such a multi-parameter representation. 
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5. DISCUSSION 
5.1. Varying Sample Selection Cuts 

We vary several sample selection cuts to test the robustness 
of the derived planet frequency. 

First we vary the detection thresholds: A^tra, and S/N thresh- 
olds. A S/N threshold=12 (8 is used for the main results) is 
applied, and the results are shown in the upper left panel of 
Fig. 10. Obviously smaller planets are more affected by mak- 
ing this new cut, and as a results, the statistical uncertainty 
for Earth-size planets becomes much larger Nevertheless, the 
power-law index /? in period distribution is in good agreement 
with the main results with a lower S/N threshold. We also test 
the case if Afna requires three transits from Q1-Q6, and the re- 
sults are shown in the upper left panel of Fig. 10. This cut 
limits the number of planets at the longest period bin. Again, 
P is consistent with the main results. Next we only choose 
the bright stars {Kepler magnitude niK <14.5) in our stellar 
sample. These stars on average have less noise than the main 
sample, thus the transits for small planets have higher S/N ra- 
tios. The results are well consistent with the main ones for 
p. 

Given the concern over the skewed b distribution of Kepler 
planet candidates as discussed in §2.2, we test the planet fre- 
quency with a planet sample having b < 0.6. This cut causes 
bigger changes than those in all previous tests. First, it leads 
to lower planet frequencies (~ 60% relative to our fiducial 
case) since this cut decreases the number of planets by a fac- 
tor of three while it should only decrease the planet sample 
by a factor of 1/0.6=1.7 if the b distribution were uniform. 
Second, it alters the shape of the distribution for small planets 
and at long period. For planets in both 1-2 Rq^ and 2-4 
bins, the power law index (3 increases compared to the results 
using b < 0.9 cut by 1 -2(7. The power-law index for 4- 8/?0 
is 0.64 ±0.19, so it is slightly smaller than the main result but 
well within uncertainty. 

Our conclusion that, beyond 10 days, small sized planets 
(esp. Super-earth-size planets) have a nearly flat distribution, 
and Neptune-size planets show a fast rising distribution be- 
yond ^ 10 days appears to be robust from our various cuts. 

5.2. False Positives & Blending 

Astrophysical false positives for planet transit candidates 
usually involve various scenarios of blending with eclipsing 
binaries. Only a small fraction of Kepler planet candidates 
have been confirmed by RV (or transit timing variations). It is 
unlikely that a significant fraction of Kepler candidates will 
be confirmed by RV given most of them are hosted by rel- 
atively dim stars and have too low masses to be followed 
up by RV for existing facilities. Thus so far the false pos- 
itive rates for Kepler candidates are mostly estimated sta- 
tistically rather than from direct measurements. Lissauer et 
al. (2012) estimated that ^98% of the planets candidates in 
multi-transiting systems are not due to false positives. Early 
statistical estimate on the overall Kepler sample according 
to Galactic models and stellar population synthesis by Mor- 
ton & Johnson (201 1) claimed that Kepler candidates have a 
low rate (< 10%) of false positives. However, Santerne et al. 
(2012) found that ^ 35% of candidates are due to false pos- 
itives by following up 46 Jupiter-size planet candidates with 
P < 25 days from Bll sample. The discrepancy with Mor- 
ton & Johnson (2011) is probably because Morton & John- 
son (2011) did not take M-dwarf eclipsing binary into ac- 
count and assumed a vetting procedure more stringent than 



those applied in Bll (e.g., removing the suspicious V-shape 
transits, which was not done in Bll but done in B12). An- 
other possible source of discrepancy is that Morton & Johnson 

(2011) assumed a hierarchical triple fraction of 6%, but this 
fraction is nearly order-of-magnitude higher for inner bin aries 
with short periods Tokovinin et al. (2006), which is relevant 
to the close-in giant planet candidate sample of Santerne et al. 

(2012) . Note that these sources of discrepancy are most ap- 
plicable to short-period Jupiter-size planet candidates, which 
composes of a small fraction of Kepler planet candidates. The 
skewed impact parameter distribution toward 1 discussed in 
§ 2.2 may also alert the possibility of false-positive contam- 
inations. In this work, we consider a low false-positive rate 
and do not distinguish between planet candidates and planets. 
Known false positives are removed prior to the analysis. Our 
main conclusions on the shape of distribution functions can 
be compromised if there are significant false-positives and the 
false-positive rates depend considerably on planet radius and 
period. Systematic efforts in estimating false-positive rates 
such as BLENDER (Torres et al. 2011) and Morton (2012) 
may help to clarify this issue in the future. We also ignore 
the effects of significant blending in the light curve (Seager 
& Mallen-Omelas 2003). The primary effects of blending is 
to dilute the transit depth, and as a result, the planet radius 
can be underestimated. In addition, derived transit parameters 
such as impact parameter can also be altered due to blending. 
Rough estimate indicates blending is probably insignificant 
for most Kepler targets (T. Morton, private communications). 

5.3. Comparison with Previous Work 

Our approach to compute detection efficiency is similar to 
those undertaken in H12 while our stellar sample is factor of 
^ 2 larger than the main sample in HI 2 and the planet sample 
is factor of ~ 3 larger Importantly, the B12 planet candidates 
we use is derived from a longer observing span (Q1-Q6) than 
the Bll sample used by H12, and the improved planet de- 
tection algorithm in B12 is likely much more efficient than 
Bll and probably has a high level of completeness up to 
^ 250 days. We have also considered the effect of observing 
window function, which is essential for studying the statistics 
of long-period planets. With these improvements, we are able 
to probe a larger parameter space (Rp > IR^, P < 250days ) 
compared with H12 (Rp > 2Rq, P < 50days). For the over- 
lapped parameter space, our results are consistent with H12. 

We may also compare with the frequency of small plan- 
ets from RV surveys (Mayor et al. 2011). Detailed compari- 
son would require modeling the mass-radius relation, which 
has large uncertainty for the majority of Kepler planets of in- 
terest. We only attempt to make a tentative comparison on 
the broad features and general trends. Mayor et al. (2011) 
found that more than 50% of solar-type stars host "at least one 
planet of any mass" within ^ 100 days. This is broadly con- 
sistent with our results that 50% of Kepler solar-type stars 
host planets with Rp > IR^ with P < 100 days. Mayor et 
al. (2011) have also suggested that the frequency of planets 
with MpSini < 3OM0 may drop sharply for P > 100 days, 
although they caution that this could be an artifact of selec- 
tion bias (see the red histogram of Fig. 14 and the discus- 
sions in Sec 4.4 in their paper). Therefore, it is of interest 
to determine whether there is evidence for a parallel drop in 
planet in the Kepler data. We focus on the planet radius bin 
2 < Rp/R^ < 4, which probably contains a large fraction of 
planets in the mass bin considered by Mayor et al. (2011). Af- 
ter correcting for incompleteness. Mayor et al. (2011) found 
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that planet frequency drops by factor of ^ 3.5 from the period 
bin [56,100]days to [100, 160]days. To be specific, we ask 
how many planets would be expected in our 100<P<160 bin if 
the underlying frequency fell by a factor 3.5 at this boundary. 
We find that ^ 7 planets would be expected while 23 are actu- 
ally detected, which is not consistent with Poisson statistics. 
Therefore, the available Kepler data appear to be in tension 
with the suggestion of 100 days frequency drop by Mayor et 
al. (2011). Future Kepler release would be able to definitely 
test this claim by probing small planets at longer period. 

5.4. Implications 

The planet distribution in period and radius presented in 
this paper may bear the imprints from planet formation, mi- 
gration, dynamical evolution and possibly other physical pro- 
cesses (e.g, Ida & Lin 2004; Mordasini et al. 2012; Kenyon 
& Bromley 2006; Lopez et al. 2012). Bll and H12 found 
sharp decline in planet frequency below ~ 10 days. Our anal- 
ysis on planets with longer periods reveal that at P > 10 days, 
planets at all sizes appear to follow smooth power-law dis- 
tributions up to 250 days: either a nearly flat distribution in 
logP for small planets (< 4/?g) or a rising distribution for 
larger planets (> 4Rq). In particular, Neptune-size planets 
{Rp = have significantly increasing frequency with 

periods from ^ 10 to 200 days. We are not aware of any 
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garded as evidence for core-accretion formation scenarios. 



We thank Andy Gould and Boaz Katz for carefully reading 
the manuscript and making helpful comments. We are grate- 
ful for useful discussions with Fred Adams, Scott Gaudi, Lee 
Hartmann, Chelsea Huang, Jennifer Johnson, David Kipping, 
James Lloyd, Tim Morton, Dave Spiegel, and Scott Tremaine. 
S.D. was supported through a Ralph E. and Doris M. Hans- 
mann Membership at the IAS and NSF grant AST-0807444. 
Work by SD was performed under contract with the California 
Institute of Technology (Caltech) funded by NASA through 
the Sagan Fellowship Program. Z.Z. was supported by NSF 
grant AST-0908269 and Princeton University. 



Lopez, E. D., Fortney, J. J., & Miller, N. 2012, ApJ, 761, 59 

Mayor, M., Udry, S., Lovis, C, et al. 2009, A&A, 493, 639 

Mayor, M., Mamiier, M., Lovis, C, et al. 2011, arXiv: 1109.2497 

Mordasini, C, Alibert, Y, Georgy, C, et al. 2012, A&A, 547, Al 12 

Morton, T. D., & Johnson, J. A. 2011, ApJ, 738, 170 

Morton, T. D. 2012, ApJ, 761, 6 

Ofir, A., & Dreizler, S. 2012, arXiv: 1206.5347 

Santerne, A., Diaz, R. R, Moutou, C, et al. 2012, A&A, 545, A76 

Schwamb, M. E., Lintott, C. J., Fischer, D. A., et al. 2012, ApJ, 754, 129 

Seager, S., & Mallen-Omelas, G. 2003, ApJ, 585, 1038 

Tabachnik, S„ & Tremaine, S. 2002, MNRAS, 335, 151 

Tenenbaum, P., Jenkins, J., Seader, S,, et al. 2012, arXiv: 1212.2915 

Tokovinin, A., Thomas, S., Sterzik, M., & Udry, S. 2006, A&A, 450, 681 

ToiTcs, G., Fressin, R, Batalha, N. M., et al. 2011, ApJ, 727, 24 

Traub, W. A. 2012, ApJ, 745, 20 

Tremaine, S., & Dong, S. 2012, AJ, 143, 94 

Youdin, A, N. 201 1, ApJ, 742,38 



7 




0.5 1 0.5 1 0.5 1 0.5 1 



IAlg(g)l IAlg(g)l IAlg(g)l IAlg(g)l 

Fig. 1. — The upper panel: T^ff and log g from KIC catalog for all Kepler target stars (grey dots) as well as the "coiTected" stellar parameters derived by 
matching Yonsei-Yale isochrones following the approach in B12 (yellow dots) [see § 2.1 for detailed discussion]. The cyan line is the Yonsei-Yale isochrone for 
solar age at solar-metallicity. We also highlight 104 stars with accurate stellar parameters derived from high-resolution spectroscopic follow-ups from Buchhave 
et al. (2012). The T^k and log g for these stars from KIC are plotted as the black solid dots, while the log g value from the spectroscopic measurements are plotted 
at the end of the red lines connecting from the KIC values. Lower panels: we divide thel04 stars into four different temperature bins and calculate the difference 
between KIC and spectroscopically measured log g values. The average dispersion is about 0.3dex, except for the lowest temperature bin (4500 — 5000^") and we 
exclude stars with Tcff < 5000^" in our stellar sample. The selected stellar sample is within the black box based on the "con'ected" parameters shown in the upper 
panel. 
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Fig. 2. — The histogram of best-fit impact parameter (b) values for Kepler planet candidates reported by B 12 (the upper left panel) and posterior probability 
distribution considering Gaussian errors (the upper right panel). Clearly, the reported b is highly skewed toward high values (~ 1), especially for the candidates 
with lower S/Ns (the bottom left panel). This is a very unphysical distribution (see discussions in § 2.2). We also divide the sample into those with lower (<100) 
and higher (>100) S/N in the bottom panels. For candidates with higher S/N, it is less skewed toward siml but with a peak at 0. This is understandable as at low 
impact parameter, the transit profile is hard to distinguish from those at and the fitting algorithm may assign = as the best fit. 
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Fig. 3. — An example transit light curve demonstrating the importance of the window function for long-period transit. Quarter gaps are between the blue lines, 
while other gaps (Table 1) are marked as the red boxes. The arrow indicates the transits, and one of the transits accidentally falls into the quarter gap between Q3 
and Q4. 




100 200 300 400 



P{d) 

Fig. 4. — The window function /„.„,rfoii', defined as the fraction of simulated transits that satisfies the transit occurrence criterion as a function of period. The 
black line represent a star that has been observed over all 8 quarters. If Q5 is missing, f„i„dow is plotted as the red curve. f„i„doK is important for deriving the 
frequency of planets with long period (> 100 days) 
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Fig. 5 . — The left panel: the derived detection efficiency for our stellar sample. The right panel: the detection sensitivity considering the geometric bias and the 
detection efficiency. The contour is shown in logio of the number of planets that can be detected if every star in our sample has a planet at the given Rp and P. 




Fig. 6. — The planet frequency as a function P and Rp. All selected planet candidates are over plotted including B12 (blue dots), Huang et al. (2012) (green 
dots), and Ofir & Dreizler (2012) (yellow dots). The planet sensitivity shown in the right panel of Fig. 5 is also plotted. The lower right comer marked with the 
small grids has the least secure statistics since the sensitivity is low and the gradient in sensitivity is relatively large. 
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Fig. 7. — The intrinsic number of planets per star at different planet radius and peiiod bins plotted as a function of period. The histograms with en'or bars in 
various colors represents different planet-radius bins (red: l—2R^, blue: 2— 4i?g), green: 4 — SSgj, magenta: 8- 16i?(g, black: 1 — 16R^). The dotted-dash part 
of the histograms are for the bins with the least secure statistics, corresponding to the bins marked with small grids in Fig. and the statistics in those bins are the 
least trustworthy. The maximum likelihood best fits in power-law distril3ution as a function of period for planets beyond 10 days at each planet radius bin are over 
plotted as the grey dashed lines. For orbital period P > lOd, the planet frequency dWp/dlogP for "Neptune-size" planets {Rp = 4 — SRfg) increases with period as 
(X pf.vio.l QQfitrast, dWp/dlogP for "Super-Earth-Size" {2-4R^) as well as "Earth-size" (1 —2R^) planets are consistent with a nearly fiat distribution as a 
function of period (oc pO.UiO.os ^ p-o.lo±o.i2^ respectively), and the nonnalizations are remarkably similar (within a factor of ~ 1.5). Detailed discussion 
see § 4. 1 
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Fig . 8 . — The accumulated distribution of the intrinsic number of planets per star within P for planets at different radius bins. The color scheme is the same as 
in Fig 7. 
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Fig. 9. — The number of planet per star as a function of planet radii for planets at different period bins (blue: 0.4— 10 days, red: 10— 50 days, green: 50-250 days, 
black: < 250 days). There is considerable evolution in size distiibution as a function of period. There seems to be clear breaks in the size distribution functions 
at 3 and 10 R^. The relative fraction of big planets at 3 - lORgj compared to small planets (1 — SRgj) increase with period. Detailed discussion see § 4.1 
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Fig. 10. — Results of tests by varying the cuts in sample selections. The results are presented the same way as in Figure 7. Lower left: Cutting stellar sample 
with Kepler magnitude mj<14.5 (rather than 16 for the main analysis). Lower right: impact parameter cut of b < 0.6 rather than b < 0.9 for the main analysis. 
The upper left and upper right panels: planet detection thresholds cuts (Q1-Q6) transit number larger than 3 rather than 2 on the left; S/N>12 rather than 8 on 
the right). See § 5.1 for discussion. 
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TABLE 1 
Gaps in Kepler LIGHT CURVES 
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352.3651 


Gap Q3 & Q4 


382.9368 


385.7300 




396.3515 


403.0000 




442.212P 


443.4785 


Gap Q4 & Q5 
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477.8000 




503.4133 


505.0200 




538.1713 


539.4398 


Gap Q5 & Q6 


566.0423 


568.9000 




597.7961 


601.4000 





"For the stars that were only observed part of Q4 due to the malfunction of the CCD, the gap start extends to 373.2282. 



TABLE 2 

Power-law Fits to Kepler Planet Frequency with Periods from 10 days to 250 days with fp{P,Rp) = C x (P/lOdays)*^ . is defined in 

Equation (1). 



Planet radii 


C 




Re 




period power 


1-2 


O.66±0.08 


-0.10±0.12 


2-4 


0.49±0.03 


0.11 ±0.05 


4-8 


0.040±0.008 


0.70±0.1 


8-16 


0.023±0.007 


0.50±0.17 



TABLE 3 

Similar TO Table 3 f(P,Rp) = C x (P/lOdays)'', except by varying cuts in sample selection., fp is defined inequation (1). 



Planet radii 


C 


/3 


R(B 




period power 


S/N>12 


1-2 


0.69±0.10 


-0.14±0.2 


2-4 


0.48±0.03 


0.16±0.06 


4-8 


0.040±0.008 


0.70±0.12 


8-16 


0.023±0.007 


0.50±0.17 


Ngi-ee >3 


1-2 


0.66±0.08 


-0.11±0.13 


2-4 


0.48±0.03 


0.15±0.07 


4-8 


0.038±0.008 


0.76±0.13 


8-16 


0.024±0.007 


0.45±0.2 


niK <14.5 


1-2 


0.51 ±0.07 


-0.06±0.15 


2-4 


0.52±0.06 


0.10±0.08 


4-8 


0.046±0.015 


0.66±0.19 


8-16 


0.028±0.015 


0.35±0.31 


b<0.6 


1-2 


0.33±0.06 


0.25±0.17 


2-4 


O.23±0.03 


0.25±0.1 


4-8 


0.025±0.008 


0.64±0.19 


8-16 


0.025±0.01 


0.37±0.23 



